Automatic Spoken Language Translation Template Acquisition Based on Boosting Structure Extraction and Alignment

نویسندگان

  • Rile Hu
  • Xia Wang
چکیده

In this paper, we propose a new approach for acquiring translation templates automatically from unannotated bilingual spoken language corpora. Two basic algorithms are adopted: a grammar induction algorithm, and an alignment algorithm using Bracketing Transduction Grammar. The approach is unsupervised, statistical, data-driven, and employs no parsing procedure. The acquisition procedure consists of two steps. First, semantic groups and phrase structure groups are extracted from both the source language and the target language through a boosting procedure, in which a synonym dictionary is used to generate the seed groups of the semantic groups. Second, an alignment algorithm based on Bracketing Transduction Grammar aligns the phrase structure groups. The aligned phrase structure groups are post-processed, yielding translation templates. Preliminary experimental results show that the algorithm is effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approach to Automatic Translation Template Acquisition Based on Unannotated Bilingual Grammar Induction

In this paper, we propose a new approach which can automatically acquire translation templates from the unannotated bilingual spoken language corpora in the domain of travel information accessing. In the approach, two basic algorithms named grammar induction algorithm and dynamic programming algorithm are adopted. Our approach is an unsupervised, statistical, data-driven method which avoids the...

متن کامل

Automatic Translation Template Acquisition Based on Bilingual Structure Alignment

Knowledge acquisition is a bottleneck in machine translation and many NLP tasks. A method for automatically acquiring translation templates from bilingual corpora is proposed in this paper. Bilingual sentence pairs are first aligned in syntactic structure by combining a language parsing with a statistical bilingual language model. The alignment results are used to extract translation templates ...

متن کامل

Automatic extraction of differences between spoken and written languages, and automatic translation from the written to the spoken language

We extracted the di erences between spoken language and written language from a spoken-language corpus and a writtenlanguage corpus by using the UNIX command \di " and examined the di erences to determine the construction of the grammars of the two corpora. We also transformed written-language sentences into spoken-language sentences by using rules based on the extracted di erences.

متن کامل

Set-Phrase Machine Translation Based on Multilingual Dictionaries

The paper focuses on the issues of automatic compiling of the set-phrase dictionaries for machine translation systems. The methods employed are based on translation memory acquisition principles and heuristic language processing tools. Machine learning techniques are used for extraction of new rules and templates.

متن کامل

Tightly integrated spoken language understanding using word-to-concept translation

This paper discusses an integrated spoken language understanding method using a statistical translation model from words to semantic concepts. The translation model is an N-gram-based model that can easily be integrated with speech recognition. It can be trained using annotated corpora where only sentencelevel alignments between word sequences and concept sets are available, by automatic alignm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006